联合学习(FL)是一个新的分布式机器学习框架,可以在不收集用户的私人数据的情况下获得可靠的协作培训。但是,由于FL的频繁沟通和平均聚合策略,他们会遇到挑战统计多样性数据和大规模模型。在本文中,我们提出了一个个性化的FL框架,称为基于Tensor分解的个性化联合学习(TDPFED),在该框架中,我们设计了一种具有张力的线性层和卷积层的新颖的张力局部模型,以降低交流成本。 TDPFED使用双级损失函数来通过控制个性化模型和张力的本地模型之间的差距来使全球模型学习的个性化模型优化。此外,有效的分布式学习策略和两种不同的模型聚合策略是为拟议的TDPFED框架设计的。理论融合分析和彻底的实验表明,我们提出的TDPFED框架在降低交流成本的同时实现了最新的性能。
translated by 谷歌翻译
A common scenario of Multilingual Neural Machine Translation (MNMT) is that each translation task arrives in a sequential manner, and the training data of previous tasks is unavailable. In this scenario, the current methods suffer heavily from catastrophic forgetting (CF). To alleviate the CF, we investigate knowledge distillation based life-long learning methods. Specifically, in one-tomany scenario, we propose a multilingual distillation method to make the new model (student) jointly learn multilingual output from old model (teacher) and new task. In many-to one scenario, we find that direct distillation faces the extreme partial distillation problem, and we propose two different methods to address it: pseudo input distillation and reverse teacher distillation. The experimental results on twelve translation tasks show that the proposed methods can better consolidate the previous knowledge and sharply alleviate the CF.
translated by 谷歌翻译
In recent years, deep-learning-based approaches have been introduced to solving time-series forecasting-related problems. These novel methods have demonstrated impressive performance in univariate and low-dimensional multivariate time-series forecasting tasks. However, when these novel methods are used to handle high-dimensional multivariate forecasting problems, their performance is highly restricted by a practical training time and a reasonable GPU memory configuration. In this paper, inspired by a change of basis in the Hilbert space, we propose a flexible data feature extraction technique that excels in high-dimensional multivariate forecasting tasks. Our approach was originally developed for the National Science Foundation (NSF) Algorithms for Threat Detection (ATD) 2022 Challenge. Implemented using the attention mechanism and Convolutional Neural Networks (CNN) architecture, our method demonstrates great performance and compatibility. Our models trained on the GDELT Dataset finished 1st and 2nd places in the ATD sprint series and hold promise for other datasets for time series forecasting.
translated by 谷歌翻译
This paper provides an introductory survey to GPT-3. We cover some of the historical development behind this technology, some of the key features of GPT-3, and discuss the machine learning model and the datasets used. We survey both academic and commercial efforts applying GPT-3 in diverse domains such as developing conversational AI chatbots, software development, creative work, domain knowledge, and business productivity. We discuss some of the challenges that GPT-3 faces such as the problems of training complexity, bias, and hallucination/incorrect answers. We also discuss the future research opportunities in this area.
translated by 谷歌翻译
Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Batch Normalization (BN) is an important preprocessing step to many deep learning applications. Since it is a data-dependent process, for some homogeneous datasets it is a redundant or even a performance-degrading process. In this paper, we propose an early-stage feasibility assessment method for estimating the benefits of applying BN on the given data batches. The proposed method uses a novel threshold-based approach to classify the training data batches into two sets according to their need for normalization. The need for normalization is decided based on the feature heterogeneity of the considered batch. The proposed approach is a pre-training processing, which implies no training overhead. The evaluation results show that the proposed approach achieves better performance mostly in small batch sizes than the traditional BN using MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets. Additionally, the network stability is increased by reducing the occurrence of internal variable transformation.
translated by 谷歌翻译
在本文中,我们提出了一个称为SDFE-LV的大规模,多源和不受约束的数据库,用于发现长视频中完整动态面部表达的发作和偏移帧,这被称为动态面部表情斑点的主题(DFE)和许多面部表达分析任务的重要步骤。具体而言,SDFE-LV由1,191个长视频组成,每个视频包含一个或多个完整的动态面部表情。此外,在相应的长视频中,每个完整的动态面部表达都被10次训练有素的注释者独立标记了五次。据我们所知,SDFE-LV是DFES任务的第一个无限制的大规模数据库,其长期视频是从多个现实世界/密切现实世界中的媒体来源收集的,例如电视采访,纪录片,电影和电影,以及我们媒体短视频。因此,在实践中,SDFE-LV数据库上的DFE任务将遇到许多困难,例如头部姿势变化,遮挡和照明。我们还通过使用许多最新的深度发现方法,从不同角度提供了全面的基准评估,因此对DFE感兴趣的研究人员可以快速而轻松地开始。最后,通过有关实验评估结果的深入讨论,我们试图指出几个有意义的方向来处理DFES任务,并希望将来DFE可以更好地进步。此外,SDFE-LV将仅尽快自由发布供学术使用。
translated by 谷歌翻译
经过标准的横向损失训练的深度神经网络更容易记住嘈杂的标签,从而降低了其性能。当嘈杂的标签干预时,使用互补标签的负面学习更加健壮,但模型收敛速度极慢。在本文中,我们首先引入了双向学习方案,在这种方案中,积极的学习可确保收敛速度,而负面学习则可以与标签噪声保持稳健的应对。此外,提出了一种动态样本重新加权策略,以通过利用负面学习对样本概率分布的出色歧视能力来削弱噪声标记样品的影响。此外,我们结合了自我鉴定,以进一步提高模型性能。该代码可在\ url {https://github.com/chenchenzong/bldr}中获得。
translated by 谷歌翻译
COVID-19大流行刺激的快速数字化导致了更多的网络犯罪。现在,恶意软件即服务是网络犯罪分子的蓬勃发展的业务。随着恶意软件活动的激增,对于网络辩护人来说,更多地了解他们手头的恶意软件样本,因为这些信息可以极大地影响他们在违规过程中的下一步行动。最近,研究人员展示了如何通过将恶意软件二进制文件转换为灰度图像,然后通过神经网络进行分类来完成恶意软件家庭分类。但是,大多数工作着重于研究不同神经网络体系结构对分类性能的影响。在去年,研究人员表明,通过自我监督学习来增强监督学习可以提高绩效。甚至最近,Data2Vec被提议为一种训练神经网络的情态自我监督框架。在本文中,我们介绍了Binimg2Vec,这是一个培训恶意软件二进制图像分类器的框架,该框架既包含了自我监督的学习和监督学习,又可以产生一个模型,该模型始终优于仅通过监督学习而受过培训的模型。我们能够在分类性能上提高4%,并在多次运行中降低0.5%的性能差异。我们还展示了我们的框架如何产生可以很好地聚类的嵌入,从而促进模型的解释。
translated by 谷歌翻译
在本文中,我们开发了一种物理知识的神经网络(PINN)模型,用于具有急剧干扰初始条件的抛物线问题。作为抛物线问题的一个示例,我们考虑具有点(高斯)源初始条件的对流 - 分散方程(ADE)。在$ d $维的ADE中,在初始条件衰减中的扰动随时间$ t $ as $ t^{ - d/2} $,这可能会在Pinn解决方案中造成较大的近似错误。 ADE溶液中的局部大梯度使该方程的残余效率低下的(PINN)拉丁高立方体采样(常见)。最后,抛物线方程的PINN解对损耗函数中的权重选择敏感。我们提出了一种归一化的ADE形式,其中溶液的初始扰动不会降低幅度,并证明该归一化显着降低了PINN近似误差。我们提出了与通过其他方法选择的权重相比,损耗函数中的权重标准更准确。最后,我们提出了一种自适应采样方案,该方案可显着减少相同数量的采样(残差)点的PINN溶液误差。我们证明了提出的PINN模型的前进,反向和向后ADE的准确性。
translated by 谷歌翻译